Skewed Base Compositions, Asymmetric Transition Matrices, and Phylogenetic Invariants

نویسندگان

  • Vincent Ferretti
  • B. Franz Lang
  • David Sankoff
چکیده

Evolutionary inference methods that assume equal DNA base compositions and symmetric nucleotide substitution matrices, where these assumptions do not hold, are likely to group species on the basis of similar base compositions rather than true phylogenetic relationships. We propose an invariants-based method for dealing with this problem. An invariant QT of a tree T under a k-state Markov model, where a generalized time parameter is identified with the E edges of T, allows us to recognize whether data on N observed species can be associated with the N terminal vertices of T in the sense of having been generated on T rather than on any other tree with N terminals. The form of the generalized time parameter is a positive determinant matrix in some semigroup S of stochastic matrices. The invariance is with respect to the choice of the set of E matrices in S, one associated with each of the E edges of T. We apply a general "empirical" method of finding invariants of a parametrized functional form. It involves calculating the probability f of all KN data possibilities for each of m sets of E matrices in S to associate with the edges of T, then solving for the parameters using the m equations of form Q(f) = 0. We discuss the problems of finding asymmetric models satisfying the property of semigroup closure, of finding asymmetric models that admit invariants at all, and of the computational complexity of the method. We propose a class of semigroups Sc containing matrices of form [formula: see text] to account for A+T versus G+C asymmetries in DNA base composition. Quadratic invariants are obtained for rooted trees with three and with four terminals. In the latter case the smallest set of algebraically independent invariants is sought. These invariants are applied to data pertaining the fungal evolution and to the origin of mitochondria as bacterial endosymbionts.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Toric Ideals of Phylogenetic Invariants

Statistical models of evolution are algebraic varieties in the space of joint probability distributions on the leaf colorations of a phylogenetic tree. The phylogenetic invariants of a model are the polynomials which vanish on the variety. Several widely used models for biological sequences have transition matrices that can be diagonalized by means of the Fourier transform of an abelian group. ...

متن کامل

Phylogenetic invariants for more general evolutionary models.

An invariant Q of a tree T under a k-state Markov model, where a generalized time parameter is identified with the E edges of T, allows us to recognize whether data on N observed species (usually, N DNA sequences, one from each species) can be associated with the N leaves of T in the sense of having been generated on T rather than on any other N-leaf tree. The form of the generalized time param...

متن کامل

The Strand Symmetric Model

Important special cases of strand symmetric Markov models are the groupbased phylogenetic models including the Jukes-Cantor model and the Kimura 2 and 3 parameter models. The general strand symmetric model or in this chapter just the strand symmetric model (SSM) has only these eight equalities of probabilities in the transition matrices and no further restriction on the transition probabilities...

متن کامل

q-Cartan matrices and combinatorial invariants of derived categories for skewed-gentle algebras

Cartan matrices are of fundamental importance in representation theory. For algebras defined by quivers (i.e. directed graphs) with relations the computation of the entries of the Cartan matrix amounts to counting nonzero paths in the quivers, leading naturally to a combinatorial setting. In this paper we study a refined version, so-called q-Cartan matrices, where each nonzero path is weighted ...

متن کامل

Counting phylogenetic invariants in some simple cases.

An informal degrees of freedom argument is used to count the number of phylogenetic invariants in cases where we have three or four species and can assume a Jukes-Cantor model of base substitution with or without a molecular clock. A number of simple cases are treated and in each the number of invariants can be found. Two new classes of invariants are found: non-phylogenetic cubic invariants te...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of computational biology : a journal of computational molecular cell biology

دوره 1 1  شماره 

صفحات  -

تاریخ انتشار 1994